2 research outputs found
Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes
A honeynet is a promising active cyber defense mechanism. It reveals the
fundamental Indicators of Compromise (IoCs) by luring attackers to conduct
adversarial behaviors in a controlled and monitored environment. The active
interaction at the honeynet brings a high reward but also introduces high
implementation costs and risks of adversarial honeynet exploitation. In this
work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to
characterize a stochastic transition and sojourn time of attackers in the
honeynet and quantify the reward-risk trade-off. In particular, we design
adaptive long-term engagement policies shown to be risk-averse, cost-effective,
and time-efficient. Numerical results have demonstrated that our adaptive
engagement policies can quickly attract attackers to the target honeypot and
engage them for a sufficiently long period to obtain worthy threat information.
Meanwhile, the penetration probability is kept at a low level. The results show
that the expected utility is robust against attackers of a large range of
persistence and intelligence. Finally, we apply reinforcement learning to the
SMDP to solve the curse of modeling. Under a prudent choice of the learning
rate and exploration policy, we achieve a quick and robust convergence of the
optimal policy and value.Comment: The presentation can be found at https://youtu.be/GPKT3uJtXqk. arXiv
admin note: text overlap with arXiv:1907.0139